Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEP-17] Add GEP-17 with an updated Control Plane Migration "Bad Case" Scenario description #4107

Merged
merged 5 commits into from
Sep 9, 2021

Conversation

stoyanr
Copy link
Contributor

@stoyanr stoyanr commented May 26, 2021

How to categorize this PR?

/area control-plane-migration
/area documentation
/kind enhancement
/priority normal

What this PR does / why we need it:

  • Removes the "bad case" scenario description from GEP-7.
  • Adds a new GEP-17 with an updated "bad case" scenario description aligned with the latest PoC implementation and results.

Which issue(s) this PR fixes:
Part of #1631

Special notes for your reviewer:
This PR supercedes #3066. While it builds upon the same main ideas, the description is restructured and aligned with the results of the PoC.

Release note:

NONE

@stoyanr stoyanr requested a review from a team as a code owner May 26, 2021 08:59
@gardener-robot gardener-robot added area/control-plane-migration Control plane migration related area/documentation Documentation related kind/enhancement Enhancement, improvement, extension labels May 26, 2021
@gardener-robot
Copy link

@stoyanr Label priority/normal does not exist.

@gardener-robot gardener-robot added needs/review size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 26, 2021
@stoyanr stoyanr marked this pull request as draft May 26, 2021 08:59
@stoyanr stoyanr changed the title Update GEP-7 with a new "bad case scenario" description [GEP-7] Update GEP-7 with a new "bad case scenario" description May 26, 2021
@stoyanr
Copy link
Contributor Author

stoyanr commented May 26, 2021

/cc @plkokanov @kris94

@stoyanr
Copy link
Contributor Author

stoyanr commented May 27, 2021

We consider this PR as ready for review now.
/cc @rfranzke @timebertt @mandelsoft

@stoyanr
Copy link
Contributor Author

stoyanr commented May 28, 2021

/cc @amshuman-kr @vlerenc

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well written, thanks for the summary and the PoC, quite interesting! Happy to attend the meeting the next!

docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
Copy link
Member

@vlerenc vlerenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the write-up. I had a few questions, please see inline.

docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
docs/proposals/07-shoot-control-plane-migration.md Outdated Show resolved Hide resolved
@gardener-robot gardener-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2021
@stoyanr stoyanr changed the title [GEP-7] Update GEP-7 with a new "bad case scenario" description [GEP-17] Add GEP-17 with an updated Control Plane Migration "Bad Case" Scenario description Aug 23, 2021
@gardener-robot
Copy link

@stoyanr Label priority/normal does not exist.

@stoyanr
Copy link
Contributor Author

stoyanr commented Aug 23, 2021

/cc @plkokanov @kris94 @rfranzke @timebertt @vlerenc @amshuman-kr @mandelsoft @MartinWeindel

To make the bad case scenario description easier to understand and discuss, I moved it to a new GEP (GEP-17). It was independent from the old GEP (GEP-7) already which is already implemented and released.

The new proposal is very different from the original one proposed about 3 months ago. It is significantly simpler, as it is based on the owner DNS records that could be meanwhile introduced after the DNSRecords feature was implemented. There are no more CopyOperation objects and "cluster leases".

I resolved all existing comments, since they were either no longer relevant (regarding parts that have been removed), or sufficiently well answered (at least from my perspective). As before, everything you see in the proposal has been implemented and tested as part of the PoC, and the productization of certain parts (copying of snapshots) is already ongoing.

@stoyanr stoyanr marked this pull request as ready for review August 23, 2021 13:53
@rfranzke rfranzke self-requested a review August 24, 2021 13:22
@rfranzke
Copy link
Member

/touch

@gardener-robot gardener-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 24, 2021
@vlerenc
Copy link
Member

vlerenc commented Sep 2, 2021

Thank you @stoyanr and colleagues. The new proposal can be understood very well and seems safer and also easier to build. A question that I would have is what happens if the destination seed cannot access the source seed's backup bucket. Should that happen, the migration needs to be put on hold (possibly indefinitely) as without a backup/snapshot there is no point in bringing back the cluster as it would be empty. How will this be guardrailed?

@stoyanr
Copy link
Contributor Author

stoyanr commented Sep 2, 2021

what happens if the destination seed cannot access the source seed's backup bucket. Should that happen, the migration needs to be put on hold (possibly indefinitely) as without a backup/snapshot there is no point in bringing back the cluster as it would be empty.

Yes, this is what will happen. The copying of snapshots task will fail, and the restoration flow will keep retrying it until it eventually succeeds, or until the shoot is marked as Failed after a certain timeout.

@stoyanr
Copy link
Contributor Author

stoyanr commented Sep 2, 2021

@rfranzke Thanks for your feedback! I added a new section to the GEP regarding Handling Inability to Resolve the Owner DNS Record and I answered all your other comments. I think no other changes to the GEP are required, let me know if you still miss something.

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@rfranzke rfranzke merged commit 23267e4 into gardener:master Sep 9, 2021
krgostev pushed a commit to krgostev/gardener that referenced this pull request Apr 21, 2022
…" Scenario description (gardener#4107)

* Remove old base case description

* Update GEP-7 with a new bad case scenario description

* Move CPM bad case description to a new GEP

* Update GEP-17 with the latest PoC changes

* Add Handling Inability to Resolve the Owner DNS Record section
krgostev pushed a commit to krgostev/gardener that referenced this pull request Jul 5, 2022
…" Scenario description (gardener#4107)

* Remove old base case description

* Update GEP-7 with a new bad case scenario description

* Move CPM bad case description to a new GEP

* Update GEP-17 with the latest PoC changes

* Add Handling Inability to Resolve the Owner DNS Record section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane-migration Control plane migration related area/documentation Documentation related kind/enhancement Enhancement, improvement, extension size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants